Categorial interactions

Scarr-Rowe

We are going to use the Scarr-Rowe effect (or hypothesis) to look more closely on interaction effects. The Scarr-Rowe hypothesis states that the (genetic) heritability of a trait depends on the environment, such that the effects of genes are larger when environments are better. The underlying idea is that if everyone lives in a perfect environment, i.e. there is no variation in the relevant environment, then a trait will only depend on genes.

This interaction can be visualized as follows:

par(mar=c(3,3,.5,.5), mgp=c(1.75,.5,0), tck=-.01)
heritability = c(scarce = 0.5, plentiful = 0.8)
barplot(heritability,
        ylim = c(0,1),
        xlab = "environment",
        ylab = "heritability (effect of genes)")

Here is a DAG that describes such a model, where

  • A = additive genetic effects
  • C = common effects in a family
  • E = idiosyncratic effects and measurement error

and the Scarr-Rowe effect means the the coefficient of the path \(\small A \rightarrow IQ\) depends on \(\small SES\).

library(dagitty) 
dag = dagitty(
  "dag{
  IQ;
  A -> IQ;
  C -> IQ;
  E -> IQ;
  SES -> IQ;
  }")
coord.list = 
  list(
    x=c(A=1,C=2,E=3,SES=2,IQ=2),
    y=c(A=-1,C=-1,E=-1,SES=1,IQ=0))
coordinates(dag) = coord.list
drawdag(dag, cex = 1.5)

Note that DAGs do not encode interaction effects by drawing an arrow from the moderator to the relevant path. Instead, there is a path from the moderator to the outcome variable. So the DAG only tells us that IQ is a function of four other variables \(\small IQ = f(A,C,E,SES)\), but it does not tell us what the function \(f()\) is.

This could be

  • \(\small IQ = A + C + E + SES\),
  • \(\small IQ = A\,C\,E + SES\) or
  • \(\small IQ = A \, SES + C \, SES + E\),
  • \(\small IQ = A \, SES + log(C) \, SES + E\)

or any other imaginable function that uses the variables. The model we have in mind now is:

\[ IQ = f_1(SES) A + f_2(SES) C + E \]

Lets simulate data that show the the Scarr-Rowe effect, first our exogenous variables. To keep things simple, we assume that all variables are normally distributed:

set.seed(3)
N = 250
SES = rnorm(N,mean = 5,1)
A = rnorm(N,mean = 10,2)
C = rnorm(N,mean = 10,2)
E = rnorm(N, sd = 7)

Now comes the interesting part: Scarr-Rowe assumes that the effect or weight of genes depends on the SES. So we formulate the weight of genes as as a function of SES. For good measure, we also let the effect of the familial environment vary by SES.

library(boot)
# we use inv.logit keep to give weights a lower and upper bound
b_A = function(SES) 1 + inv.logit(SES-5)*3
b_C = function(SES) inv.logit(-SES+5)

We are literally defining the weight for A as a function of SES. This is one way to understand interactions. Here is a visualization of the weights:

par(mar=c(3,3,.5,.5), mgp=c(1.75,.5,0), tck=-.01)
curve(b_A(x),2,8, y = expression("effect size"~beta), 
      xlab = "SES", ylim = c(0,4), col = "red")
curve(b_C(x),2,8, xlab = "SES", add = T, col = "blue")
text(c(5,5),
     c(b_A(5), b_C(5)),
     labels = c(expression(beta[A]),
                expression(beta[C])), pos = 3)

In this plot, the interaction is not encoded by the fact that \(\small \beta_A\) and \(\small \beta_C\) have different slopes, but by the fact that both have a slope in the first place. The figure shows two interactions, because the effect size for \(\small A\) and \(\small C\) changes with SES.

And now we can simulate IQ values using exogenous variables and derived weights. We assume that IQ depends on familial factors \(\small C\), additive genetic effects \(\small A\): \(\small IQ = f_1(SES) A + f_2(SES) C + E\). Lets simulate data from this model:

# Adding 70 to get an IQ around 100
IQ = 70 + b_C(SES)*C + b_A(SES)*A + E
hist(IQ)

If we just look at the bivariate associations between \(\small A\) or \(\small SES\) and \(\small IQ\), we get the following plot:

par(mfrow = c(1,2),mar=c(3,3,1,.5), mgp=c(1.75,.5,0), tck=-.01)
plot(SES,IQ, pch = 16, cex = .5,)
#abline(lm(IQ~SES))
plot(A,IQ, pch = 16, cex = .5,)

#abline(lm(IQ~A))
# plot(C,IQ)
# abline(lm(IQ~C))

To run a simple interaction with a categorical interaction variable we use only a subset of our sample, the top and lower 30%, and code a categorical SES variables SES.c:

low.SES = which(SES < quantile(SES,.30))
high.SES = which(SES > quantile(SES,.70))
idx = c(low.SES,high.SES)
dt = data.frame(
  A = A[idx],
  C = C[idx],
  IQ = IQ[idx],
  SES = SES[idx],
  SES.c = c(rep(1,length(low.SES)),
            rep(2,length(high.SES))))

Lets plot the association between \(\small A\) and \(\small IQ\) again, this time split by SES:

low.SES = dt$SES.c == 1
high.SES = dt$SES.c == 2
plot_data = function(dt) {
  par(mar=c(3,3,1,.5), mgp=c(1.75,.5,0), tck=-.01)
  plot(IQ ~ A, data = dt, col = dt$SES.c, pch = 16, cex = .5,
       ylim = range(dt$IQ), ylab = "IQ", xlab = "A")
  legend("topleft", pch = 16, col = c("black","red"),
         legend = c("low SES","high SES"), bty = "n")
  
}
plot_data(dt)

#abline(lm(IQ ~ A, data = dt[low.SES,]), col = "black")
#abline(lm(IQ ~ A, data = dt[high.SES,]), col = "red")

Next, we estimate some linear regression models with increasing complexity, our goal being to have a model that accurately depicts the effect of A, C, E, and SES on the IQ.

Simple linear regression

We start with a simple linear regression. But first some intuitions on priors:

  • Eyeballing the data, we see that the IQ at an average \(\small A\) of 10 is around 100, so we use a prior a ~ dnorm(100,5)
  • The range of \(\small A\) is 16-6 = 10 and the range of IQ = 120-80 = 40. So for a one unit increase of \(\small A\), IQ changes around 40/10 = 4. If we want that an effect of +/-4 is at 1 sd of the prior, we set the prior for the effect of \(\small A\) to a ~ dnorm(0,4). This prior allows for the possibility of a negative effect of \(\small A\). Note that for this prior to work
  • Lacking a strong intuition for the error variance, we set the prior for the variance to a generous dexp(0.25)
IQ.A = 
  quap(
    alist(
      IQ ~ dnorm(mu,sigma),
      mu <- a + b*(A - 10),
      a ~ dnorm(100,5),
      b ~ dnorm(0,4),
      sigma ~ dexp(.5)
    ),
    data=dt)

Lets quickly look at the prior predictions to make sure the piors are OK.

prior = extract.prior(IQ.A)
A.seq = seq(from=0,to=20,length.out=50)
mu = link(
  IQ.A,
  post=prior,
  data=data.frame(A=A.seq))

par(mar=c(3,3,1,.5), mgp=c(1.75,.5,0), tck=-.01)
plot(
  0,type = "n", ylab = "IQ", xlab = "A", 
  xlim = c(min(A),max(A)),
  ylim = c(70,130))
matlines(
  A.seq,t(mu[1:100,]),type = "l", lty = 1,
  col = adjustcolor("blue", alpha = .5))

Yes, this looks good.

Here are the posterior predictions:

plot_data(dt)
plot_mu.CIs(IQ.A, data.frame(A=A.seq), "blue", spaghetti = TRUE)

Linear regression with category-main effect

To fit a model with a main effect of education, we use the indexing approach:

set.seed(1)
IQ.A_SES = 
  quap(
    alist(
      IQ ~ dnorm(mu,sigma),
      mu <- a[SES.c] + b*(A - 10),
      a[SES.c] ~ dnorm(100,5),
      b ~ dnorm(0,4),
      sigma ~ dexp(.5)
    ),
    data=dt)
plot_data(dt)
plot_mu.CIs(IQ.A_SES, data.frame(A=A.seq, SES.c = 1), "black")
plot_mu.CIs(IQ.A_SES, data.frame(A=A.seq, SES.c = 2), "red")

We can see already from the plot the the model with separate means for high and low SES is better. But here is the comparison with PSIS-LOO:

compare(
  IQ.A,IQ.A_SES,
  func = "PSIS") %>% 
  round(2)
##             PSIS    SE dPSIS   dSE pPSIS weight
## IQ.A_SES 1014.17 14.62  0.00    NA  4.07      1
## IQ.A     1077.78 15.52 63.61 12.53  3.13      0

Linear regression with category-main effect and category-slope

Lastly, we can also let the slopes vary by SES:

IQ.AxSES = 
  quap(
    alist(
      IQ ~ dnorm(mu,sigma),
      mu <- a[SES.c] + b[SES.c]*(A - 10),
      a[SES.c] ~ dnorm(100,5),
      b[SES.c] ~ dnorm(0,2),
      sigma ~ dexp(.5)
    ),
    data=dt)

And again the posterior predictions:

par(mar=c(3,3,2,.5), mgp=c(1.75,.5,0), tck=-.01)
plot_data(dt)
plot_mu.CIs(IQ.AxSES, data.frame(A=A.seq, SES.c = 1), "black")
plot_mu.CIs(IQ.AxSES, data.frame(A=A.seq, SES.c = 2), "red")

Even though the DGP does not have different “intercepts” for high and log SES, we have to add it, because our model puts the intercept at A = 10, whereas it was A = 0 in the DGP.

Here is a comparison of all the models we have fit:

compare(
  IQ.A, IQ.A_SES, IQ.AxSES,
  func = "PSIS") %>% 
  round(2)
##             PSIS    SE dPSIS   dSE pPSIS weight
## IQ.AxSES 1007.85 16.20  0.00    NA  5.22   0.97
## IQ.A_SES 1014.70 14.69  6.85  6.79  4.30   0.03
## IQ.A     1077.95 15.61 70.11 14.00  3.20   0.00

We get a warning about Pareto k values being higher than 0.5, but it is (more or less) OK if k is not larger than 0.7 or 1.

So how can we figure out if the difference in slopes is reliably larger than 0? We simply extract posteriors and make some calculations with them:

post = extract.samples(IQ.AxSES)
names(post)
## [1] "sigma" "a"     "b"
head(post$b)
##           [,1]     [,2]
## [1,] 1.1506010 2.980423
## [2,] 1.4792998 2.571067
## [3,] 1.1977443 2.863966
## [4,] 0.8793765 2.514056
## [5,] 1.0415250 3.043657
## [6,] 1.6412887 2.227408

We simply calculate the differences of the two b parameters:

delta.b = post$b[,2]-post$b[,1]

And now we can show e.g. mean and PIs:

c(mean = mean(delta.b),
  PI(delta.b,prob = c(.9)))
##     mean       5%      95% 
## 1.548680 0.700257 2.401483

And we can plot a histogram of the contrast:

hist(delta.b, xlim = range(c(0,delta.b)))
abline(v = 0, lty = 2)
abline(v = PI(delta.b,prob = c(.95)), col = "red")

To see if our results are reasonable, lets compare our estimated parameters with the parameters governing the DGP. First the model parameters:

precis(IQ.AxSES, depth = 2) %>% round(2)
##         mean   sd   5.5%  94.5%
## a[1]   95.40 0.78  94.16  96.64
## a[2]  105.25 0.76 104.03 106.47
## b[1]    1.34 0.36   0.76   1.92
## b[2]    2.89 0.37   2.29   3.48
## sigma   6.65 0.38   6.05   7.25

Now the parameters from the DGP:

low.SES = which(SES < quantile(SES,.30))
high.SES = which(SES > quantile(SES,.70))
rbind(
  b.low.SES = b_A(SES[low.SES]) %>% mean(), 
  b.high.SES = b_A(SES[high.SES]) %>% mean())
##                [,1]
## b.low.SES  1.760083
## b.high.SES 3.275656

We are not recovering the exact parameters, after all we used a golem instead of a model that depicts the DGP, but qualitatively the results are consistent with the DGP.

Symetric interactions

Earlier, we described this function.

\[ IQ = f_1(SES) A + ... \]

By saying that the effect \(\small A\) is a function of \(\small SES\). However, we really just have to variables: \(\small A\) and \(\small f_1(SES)\) which are multiplied with each other. Therefore, these statements are equivalent:

  • The effect of \(\small A\) depends on \(\small f_1(SES)\)
  • The effect of \(\small f_1(SES)\) depends on \(\small A\)

Accordingly, we can also plot the difference between high and low \(\small SES\) as a function of \(\small A\):

A.seq = seq(from=5,to=16,length.out=30)
mu.low = link(IQ.AxSES,data=data.frame(SES.c=1,A=A.seq))
mu.high = link(IQ.AxSES,data=data.frame(SES.c=2,A=A.seq))
delta = mu.high-mu.low
CIs = apply(delta,2,PI)
par(mar=c(3,3,2,.5), mgp=c(1.75,.5,0), tck=-.01)
plot(A.seq, colMeans(delta),'l',
     ylim = range(CIs),
     col = "blue",
     xlab = "A",
     ylab = expression(IQ[high~SES]~-~IQ[low~SES]))
shade(CIs,A.seq, col = adjustcolor("blue", alpha = .25))
abline(h = 0, lty = 2)
text(12,0,"higher IQ with lower SES", pos = 1, adj = 0)
text(12,0,"higher IQ with higher SES", pos = 3, adj = 0)

Continuous interactions

We are continuing with our simulated Scarr-Rowe data set, but this time we use the full data and formulate a continuous interaction.

dt = data.frame(
  A = A,
  C = C,
  IQ = IQ,
  SES = SES)

Plotting the data

You can try it the fancy way and make a 3d plot, but in this instance its a lot of effort for meager results:

library(plot3D)
points3D(A,SES,IQ, type = "h",
         col = "black",
         cex = .75,
         lty = 3,
         pch = 16,
         phi = 20,
         theta = 45,
         xlab = "A",
         ylab = "SES",
         zlab = "IQ")

A panel of 2d plots does the job better, and will later allow to display uncertainty.

qs = quantile(SES, probs = seq(0,1,.25))
par(mfrow = c(1,4), mar=c(2.5,2.5,2,.5), mgp=c(1.5,.5,0), tck=-.01)
for (k in 2:length(qs)) {
  idx = which(dt$SES > qs[k-1] & dt$SES < qs[k])
  tmp.dt = dt[idx,]
  with(tmp.dt,
       plot(IQ~A, pch = 16, main = paste0(k-1,". quantile SES"),
       ylim = range(dt$IQ), xlim = range(dt$A)))
}

Simple linear regression

Following the specification of our DGP, we can now describe interaction in the quap model as a simple multiplication:

IQc.A_SES = 
  quap(
    alist(
      IQ ~ dnorm(mu,sigma),
      mu <- a + ba*(A - 10) + bs*(SES - 2.5),
      a ~ dnorm(100,10),
      ba ~ dnorm(0,4),
      bs ~ dnorm(0,4),
      sigma ~ dexp(.5)
    ),
    data=dt)

We are using prior predictions to check if the model does a good job:

plot.pred(IQc.A_SES,dt, type = "prior")

This is pretty wild. Lets try a bit narrowers priors:

set.seed(1)
IQc.A_SES = 
  quap(
    alist(
      IQ ~ dnorm(mu,sigma),
      mu <- a + ba*(A - 10) + bs*(SES - 2.5),
      a ~ dnorm(100,7.5),
      ba ~ dnorm(0,2),
      bs ~ dnorm(0,2),
      sigma ~ dexp(.5)
    ),
    data=dt)
plot.pred(IQc.A_SES,dt, type = "prior")

These priors are reasonable, so we look at the predictions:

plot.pred(IQc.A_SES,dt)

As expected from the model we specified, all slopes are the same. Note that the different “intercepts” we seem to come from this part of the model: mu <- ... + bs*(SES - 2.5),.

Linear regression with main effects and interaction

The model is similar to the previous, but adds this interaction term: bsa*(A - 10)*(SES - 2.5).

IQc.AxSES.m = 
  quap(
    alist(
      IQ ~ dnorm(mu,sigma),
      mu <- a + ba*(A - 10) + bs*(SES - 2.5) + bsa*(A - 10)*(SES - 2.5),
      a ~ dnorm(100,7.5),
      ba ~ dnorm(0,2),
      bs ~ dnorm(0,2),
      bsa ~ dnorm(0,1),
      sigma ~ dexp(.5)
    ),
    data=dt)
plot.pred(IQc.AxSES.m,dt, type = "prior")

This priors look OK (even though slopes become extreme at high SES values), so we plot the model predictions:

plot.pred(IQc.AxSES.m,dt)

Linear regression with only interaction

Here we have removed the main effects ba*(A - 10) + bs*(SES - 2.5) from the previous model.

IQc.AxSES = 
  quap(
    alist(
      IQ ~ dnorm(mu,sigma),
      mu <- a + bsa*(A - 10)*(SES - 2.5),
      a ~ dnorm(100,7.5),
      bsa ~ dnorm(0,2),
      sigma ~ dexp(.5)
    ),
    data=dt)
plot.pred(IQc.AxSES,dt, type = "prior")

Because the model is simpler, the posterior predictions are more constraint, but still very variable. Here are the model predictions:

plot.pred(IQc.AxSES,dt)

This model cannot capture that the average IQ is higher at higher SES values. The next model adresses this:

Linear regression with SES main effect and interaction

IQc.AxSES.mSES = 
  quap(
    alist(
      IQ ~ dnorm(mu,sigma),
      mu <- a + bs*(SES - 2.5) + bsa*(A - 10)*(SES - 2.5),
      a ~ dnorm(100,7.5),
      bsa ~ dnorm(0,2),
      bs ~ dnorm(0,2),
      sigma ~ dexp(.5)
    ),
    data=dt)
plot.pred(IQc.AxSES.mSES,dt, type = "prior")

Because the model is simpler, the posterior predictions are more constraint, but still very variable. Here are the model predictions:

plot.pred(IQc.AxSES.mSES,dt)

Model comparison

compare(IQc.A_SES,IQc.AxSES.m,IQc.AxSES,IQc.AxSES.mSES)
##                    WAIC       SE     dWAIC        dSE    pWAIC       weight
## IQc.AxSES.mSES 1687.422 22.52193  0.000000         NA 4.051201 7.205434e-01
## IQc.AxSES.m    1689.331 22.54655  1.908535  0.5764573 5.021816 2.774771e-01
## IQc.A_SES      1699.216 21.64378 11.794331  8.2352022 4.148111 1.979492e-03
## IQc.AxSES      1753.033 21.16833 65.611275 14.3985290 2.664181 4.077115e-15